[Python] Detect destination of shortened, or "tiny" url

Posted by conradlee on Stack Overflow See other posts from Stack Overflow or by conradlee
Published on 2010-03-16T12:11:10Z Indexed on 2010/03/16 12:16 UTC
Read the original article Hit count: 401

Filed under:
|
|
|

I have just scraped a bunch of Google Buzz data, and I want to know which Buzz posts reference the same news articles. The problem is that many of the links in these posts have been modified by URL shorteners, so it could be the case that many distinct shortened URLs actually all point to the same news article.

Given that I have millions of posts, what is the most efficient way (preferably in python) for me to

  1. detect whether a url is a shortened URL (from any of the many URL shortening services, or at least the largest)
  2. Find the "destination" of the shortened url, i.e., the long, original version of the shortened URL.

Does anyone know if the URL shorteners impose strict request rate limits? If I keep this down to 100/second (all coming form the same IP address), do you think I'll run into trouble?

© Stack Overflow or respective owner

Related posts about python

Related posts about url